在媒体流媒体的普及之后,许多视频流服务是不断购买新的视频内容来挖掘它们的潜在利润。因此,必须处理新添加的内容,以便建议给合适的用户。在本文中,我们通过探索各种深度学习功能提供视频建议的潜力来解决新的项目冷启动问题。调查的深度学习功能包括从视频内容中捕获视觉外观,音频和运动信息的功能。我们还探讨了不同的融合方法来评估这些功能模式如何组合以完全利用它们捕获的互补信息。关于电影建议的真实视频数据集的实验表明,深度学习功能优于手工制作的功能。特别是,使用深度学习音频功能和以自行信型的深度学习功能生成的建议优于MFCC和最先进的IDT功能。此外,与手工制作特征和文本元数据的各种深度学习特征的组合产生了显着的建议改善,而不是仅相结合的前者。
translated by 谷歌翻译
Due to the environmental impacts caused by the construction industry, repurposing existing buildings and making them more energy-efficient has become a high-priority issue. However, a legitimate concern of land developers is associated with the buildings' state of conservation. For that reason, infrared thermography has been used as a powerful tool to characterize these buildings' state of conservation by detecting pathologies, such as cracks and humidity. Thermal cameras detect the radiation emitted by any material and translate it into temperature-color-coded images. Abnormal temperature changes may indicate the presence of pathologies, however, reading thermal images might not be quite simple. This research project aims to combine infrared thermography and machine learning (ML) to help stakeholders determine the viability of reusing existing buildings by identifying their pathologies and defects more efficiently and accurately. In this particular phase of this research project, we've used an image classification machine learning model of Convolutional Neural Networks (DCNN) to differentiate three levels of cracks in one particular building. The model's accuracy was compared between the MSX and thermal images acquired from two distinct thermal cameras and fused images (formed through multisource information) to test the influence of the input data and network on the detection results.
translated by 谷歌翻译
In recent years, image and video delivery systems have begun integrating deep learning super-resolution (SR) approaches, leveraging their unprecedented visual enhancement capabilities while reducing reliance on networking conditions. Nevertheless, deploying these solutions on mobile devices still remains an active challenge as SR models are excessively demanding with respect to workload and memory footprint. Despite recent progress on on-device SR frameworks, existing systems either penalize visual quality, lead to excessive energy consumption or make inefficient use of the available resources. This work presents NAWQ-SR, a novel framework for the efficient on-device execution of SR models. Through a novel hybrid-precision quantization technique and a runtime neural image codec, NAWQ-SR exploits the multi-precision capabilities of modern mobile NPUs in order to minimize latency, while meeting user-specified quality constraints. Moreover, NAWQ-SR selectively adapts the arithmetic precision at run time to equip the SR DNN's layers with wider representational power, improving visual quality beyond what was previously possible on NPUs. Altogether, NAWQ-SR achieves an average speedup of 7.9x, 3x and 1.91x over the state-of-the-art on-device SR systems that use heterogeneous processors (MobiSR), CPU (SplitSR) and NPU (XLSR), respectively. Furthermore, NAWQ-SR delivers an average of 3.2x speedup and 0.39 dB higher PSNR over status-quo INT8 NPU designs, but most importantly mitigates the negative effects of quantization on visual quality, setting a new state-of-the-art in the attainable quality of NPU-based SR.
translated by 谷歌翻译
In the last decade, exponential data growth supplied machine learning-based algorithms' capacity and enabled their usage in daily-life activities. Additionally, such an improvement is partially explained due to the advent of deep learning techniques, i.e., stacks of simple architectures that end up in more complex models. Although both factors produce outstanding results, they also pose drawbacks regarding the learning process as training complex models over large datasets are expensive and time-consuming. Such a problem is even more evident when dealing with video analysis. Some works have considered transfer learning or domain adaptation, i.e., approaches that map the knowledge from one domain to another, to ease the training burden, yet most of them operate over individual or small blocks of frames. This paper proposes a novel approach to map the knowledge from action recognition to event recognition using an energy-based model, denoted as Spectral Deep Belief Network. Such a model can process all frames simultaneously, carrying spatial and temporal information through the learning process. The experimental results conducted over two public video dataset, the HMDB-51 and the UCF-101, depict the effectiveness of the proposed model and its reduced computational burden when compared to traditional energy-based models, such as Restricted Boltzmann Machines and Deep Belief Networks.
translated by 谷歌翻译
健壮的学习是科学机器学习(SCIML)的重要问题。文献中有几篇关于该主题的作品。但是,对方法的需求不断增加,可以同时考虑SCIML模型识别中涉及的所有不同不确定性组成部分。因此,这项工作提出了一种对SCIML的不确定性评估的综合方法,该方法还考虑了识别过程中涉及的几种不确定性来源。提出的方法中考虑的不确定性是缺乏理论和因果模型,对数据腐败或不完美的敏感性以及计算工作。因此,可以为SCIML领域中的不确定性感知模型提供总体策略。该方法通过案例研究验证,开发了用于聚合反应器的软传感器。结果表明,已识别的软传感器对于不确定性是可靠的,并以所提出的方法的一致性证实。
translated by 谷歌翻译
社会机器人的快速发展刺激了人类运动建模,解释和预测,主动碰撞,人类机器人相互作用和共享空间中共同损害的积极研究。现代方法的目标需要高质量的数据集进行培训和评估。但是,大多数可用数据集都遭受了不准确的跟踪数据或跟踪人员的不自然的脚本行为。本文试图通过在语义丰富的环境中提供运动捕获,眼睛凝视跟踪器和板载机器人传感器的高质量跟踪信息来填补这一空白。为了诱导记录参与者的自然行为,我们利用了松散的脚本化任务分配,这使参与者以自然而有目的的方式导航到动态的实验室环境。本文介绍的运动数据集设置了高质量的标准,因为使用语义信息可以增强现实和准确的数据,从而使新算法的开发不仅依赖于跟踪信息,而且还依赖于移动代理的上下文提示,还依赖于跟踪信息。静态和动态环境。
translated by 谷歌翻译
使用机器学习算法从未标记的文本中提取知识可能很复杂。文档分类和信息检索是两个应用程序,可以从无监督的学习(例如文本聚类和主题建模)中受益,包括探索性数据分析。但是,无监督的学习范式提出了可重复性问题。初始化可能会导致可变性,具体取决于机器学习算法。此外,关于群集几何形状,扭曲可能会产生误导。在原因中,异常值和异常的存在可能是决定因素。尽管初始化和异常问题与文本群集和主题建模相关,但作者并未找到对它们的深入分析。这项调查提供了这些亚地区的系统文献综述(2011-2022),并提出了共同的术语,因为类似的程序具有不同的术语。作者描述了研究机会,趋势和开放问题。附录总结了与审查的作品直接或间接相关的文本矢量化,分解和聚类算法的理论背景。
translated by 谷歌翻译
从大脑对听觉和视觉刺激的响应中的信息检索通过在记录脑电图信号时呈现给参与者的歌曲名称和图像类别的分类显示了成功。以重建听觉刺激的形式进行信息检索也显示出一些成功,但是在这里我们通过对音乐刺激的重建足够好,可以独立地看到和识别来改进以前的方法。此外,为每个相应的脑电图记录的一秒钟窗口,对深度学习模型进行了对时间对齐的音乐刺激谱的培训,与先前的研究相比,这大大降低了所需的提取步骤。参与者的NMED-TEMPO和NMED-HINDI数据集被动地收听全长歌曲,用于训练和验证卷积神经网络(CNN)回归器。测试了原始电压与功率谱输入以及线性与MEL频谱图的功效,并将所有输入和输出转换为2D图像。通过训练分类器评估了重建光谱图的质量,该分类器的MEL光谱图的精度为81%,线性光谱图(10%的机会精度)的精度为72%。最后,在两种抗性的匹配到样本任务中,听众以85%的成功率(50%机会)歧视听觉音乐刺激的重建。
translated by 谷歌翻译
当使用基于视觉的方法对被占用和空的空地之间的单个停车位进行分类时,人类专家通常需要注释位置,并标记包含目标停车场中收集的图像的训练集,以微调系统。我们建议研究三种注释类型(多边形,边界框和固定尺寸的正方形),提供停车位的不同数据表示。理由是阐明手工艺注释精度和模型性能之间的最佳权衡。我们还调查了在目标停车场微调预训练型号所需的带注释的停车位数。使用PKLOT数据集使用的实验表明,使用低精度注释(例如固定尺寸的正方形),可以将模型用少于1,000个标记的样品微调到目标停车场。
translated by 谷歌翻译
椭圆测量技术允许测量材料的极化信息,需要具有不同灯和传感器配置的光学组件的精确旋转。这会导致繁琐的捕获设备,在实验室条件下仔细校准,并且在很长的获取时间,通常按照每个物体几天的顺序。最近的技术允许捕获偏振偏光的反射率信息,但仅限于单个视图,或涵盖所有视图方向,但仅限于单个均匀材料制成的球形对象。我们提出了稀疏椭圆测量法,这是一种便携式偏光获取方法,同时同时捕获极化SVBRDF和3D形状。我们的手持设备由现成的固定光学组件组成。每个物体的总收购时间在二十分钟之间变化,而不是天数。我们开发了一个完整的极化SVBRDF模型,其中包括分散和镜面成分以及单个散射,并通过生成模型来设计一种新型的极化逆渲染算法,并通过数据增强镜面反射样品的数据增强。我们的结果表明,与现实世界对象捕获的极化BRDF的最新基础数据集有很强的一致性。
translated by 谷歌翻译